88 results found.
Speech/Written
Corpus,
Language Type:
Multilingual
Languages:
Catalan French Italian Portuguese
Availability:
Freely Available
License:
GNU
Size:
61.9 hours Production Status:
Existing-used
Use:
Speech Recognition/Understanding
-
Paper title:Pretraining by Backtranslation for End-to-end ASR in Low-Resource Settings
-
Paper track:8.5 Novel neural network architectures (e.g. seque/Poster Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Matthew Wiesner | VoxForge French, Italian, Portuguese, and Catalan Subsets | /N |
Documentation:
Yes. It can be found at http://www.voxforge.org/
Speech
Corpus,
Language Type:
Monolingual
Languages:
Arabic Bengali Central Khmer Chinese Dari Egyptian Arabic English Georgian Hindi Iranian Persian Italian Japanese Korean Lao Mandarin Chinese Min Nan Chinese Moroccan Arabic Northern Khmer Panjabi Persian Russian Spanish Tagalog Thai Tigrinya Urdu Uzbek Vietnamese Wu Chinese Yue Chinese
Availability:
From Data Center(s)
License:
LDC
Size:
None Production Status:
Existing-used
Use:
Speech Recognition/Understanding
-
Paper title:End-to-End Neural Speaker Diarization with Permutation-Free Objectives
-
Paper track:4.5 Speaker diarization/Poster Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Yusuke Fujita | 2008 NIST Speaker Recognition Evaluation | /N |
Documentation:
None
Speech/Written
Corpus,
Language Type:
Multilingual
Languages:
Basque French German Italian Portuguese Spanish
Availability:
From Data Center(s)
License:
META-SHARE and/or CC
Size:
1040 hours Production Status:
Newly created-in progress
Use:
Speech Recognition/Understanding
-
Paper title:Recognition of Latin American Spanish using Multi-task Learning
-
Paper track:8.12 Cross-lingual and multilingual/accent aspects/Poster Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Carlos Mendes | SAVAS META-SHARE repository | /N |
Documentation:
None
Speech/Written
Corpus,
Language Type:
Multilingual
Languages:
Dutch English French German Italian Portuguese Romanian Spanish
Availability:
Freely Available
License:
Creative Commons Attribution-NonCommercial-NoDerivs 4.0 License
Size:
None Production Status:
Newly created-finished
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:MuST-Cinema: a Speech-to-Subtitles corpus
-
Paper track:Multimodality/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Alina Karakanta | MuSt-Cinema | /N |
Documentation:
Documentation publicly available in English
Written
Corpus Tool,
Language Type:
Multilingual
Languages:
Dutch English Italian
Availability:
Freely Available
License:
CreativeCommons
Size:
100000 sentences Production Status:
Newly created-in progress
Use:
Semantic Role Labeling
-
Paper title:Large-scale Cross-lingual Language Resources for Referencing and Framing
-
Paper track:Written/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Piek Vossen | MWEP toolkit | /N |
Documentation:
English
Multimodal/Multimedia
Corpus,
Language Type:
Monolingual
Languages:
Italian
Availability:
Freely Available
License:
Size:
14 hours Production Status:
Newly created-in progress
Use:
Corpus Creation/Annotation
-
Paper title:Adding Gesture, Posture and Facial Displays to the PoliModal Corpus of Political Interviews
-
Paper track:Multimodality/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Sara Tonelli | PoliModal corpus | /N |
Documentation:
Documentation available at the github link
Speech/Written
Corpus,
Language Type:
Multilingual
Languages:
Arabic English French German Greek Italian Portuguese Russian Spanish
Availability:
Freely Available
License:
CC BY-NC-ND 4.0
Size:
200 Production Status:
Newly created-finished
Use:
Corpus Creation/Annotation
-
Paper title:The Multilingual TEDx Corpus for Speech Recognition and Translation
-
Paper track:12.6 Speech and multimodal resources/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Elizabeth Salesky | Multilingual TEDx (mTEDx) | /N |
Documentation:
None
Speech/Written
Corpus,
Language Type:
Monolingual
Languages:
Arabic Catalan Chinese Dutch Estonian French German Indonesian Italian Japanese Latvian Mongolian Persian Portuguese Russian Slovenian Spanish Swedish Tamil Turkish Welsh
Availability:
Freely Available
License:
CC0
Size:
2880 hoursProduction Status:
Newly created-in progress
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:CoVoST 2 and Massively Multilingual Speech Translation
-
Paper track:12.1 Spoken machine translation/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Juan Pino | CoVoST 2 | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Multilingual
Languages:
Arabic Bengali Dari Egyptian Arabic English Georgian Hindi Iranian Persian Italian Japanese Khmer Korean Lao Mandarin Chinese Min Nan Chinese Moroccan Arabic Panjabi Persian Russian Spanish Tagalog Thai Tigrinya Urdu
Availability:
From Owner
License:
LDC
Size:
640 hoursProduction Status:
Existing-used
Use:
Language Identification
-
Paper title:Modeling and training strategies for language recognition systems
-
Paper track:4.1 Language identification and verification, lang/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Raphaël Duroselle | 2008 NIST Speaker Recognition Evaluation | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Monolingual
Languages:
Arabic Bengali Dari Egyptian Arabic English Georgian Hindi Iranian Persian Italian Japanese Khmer Korean Lao Mandarin Chinese Min Nan Chinese Moroccan Arabic Panjabi Persian Russian Spanish Tagalog Thai Tigrinya Urdu
Availability:
From Owner
License:
LDC
Size:
950 hoursProduction Status:
Existing-updated
Use:
Language Identification
-
Paper title:Modeling and training strategies for language recognition systems
-
Paper track:4.1 Language identification and verification, lang/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Raphaël Duroselle | 2008 NIST Speaker Recognition Evaluation Training Set Part 2 | /N |
Documentation:
None




